Project Description¶

Lego is a well-known brand worldwide, famous for its wide range of toys, popular movies, and successful video games. In this project, we will explore an important moment in Lego’s history: the introduction of licensed sets, such as Star Wars, Super Heroes, and Harry Potter.

The launch of Lego’s first licensed series, Star Wars, was very successful and led to many more collaborations with other popular themes. We will imagine that the partnerships team has asked us to analyze this success. Before we begin the analysis, we will review the descriptions of the two datasets we will use, shown below.

The Data¶

lego_sets.csv¶

Column	Description
`set_num`	A unique code for each Lego set. This is very important because a missing value means the set is a duplicate or invalid.
`name`	The name of the Lego set.
`year`	The year the set was released.
`num_parts`	The number of pieces in the set. This is not critical for our analysis, so missing values are okay.
`theme_name`	The name of the sub-theme the set belongs to.
`parent_theme`	The name of the main theme the set belongs to. This matches the `name` column in the parent_themes dataset.

parent_themes.csv¶

Column	Description
`id`	A unique code for each parent theme.
`name`	The name of the parent theme.
`is_licensed`	A Boolean value showing if the theme is licensed or not.

The Rebrickable dataset contains information about every Lego set ever sold, including set names and the bricks they contain. Although Lego bricks are small, this is a large and rich dataset. In this project, we will use this data along with the pandas library to explore the history of Lego’s licensed sets. We will also calculate what percentage of all licensed sets are themed around Star Wars.

This project will help us understand how licensed partnerships shaped Lego’s product line an contributed to its success.

What percentage of all licensed sets ever released were Star Wars themed?¶

In [2]:

# Import pandas and read in the DataFrame
import pandas as pd
lego_sets = pd.read_csv('lego_sets.csv')
lego_sets.head()

Out[2]:

	set_num	name	year	num_parts	theme_name	parent_theme
0	00-1	Weetabix Castle	1970	471.0	Castle	Legoland
1	0011-2	Town Mini-Figures	1978	NaN	Supplemental	Town
2	0011-3	Castle 2 for 1 Bonus Offer	1987	NaN	Lion Knights	Castle
3	0012-1	Space Mini-Figures	1979	12.0	Supplemental	Space
4	0013-1	Space Mini-Figures	1979	12.0	Supplemental	Space

In [3]:

# Drop relevant missing rows
lego_sets_clean = lego_sets.dropna(subset=['set_num', 'name', 'theme_name'])
lego_sets_clean.head()

Out[3]:

	set_num	name	year	num_parts	theme_name	parent_theme
0	00-1	Weetabix Castle	1970	471.0	Castle	Legoland
1	0011-2	Town Mini-Figures	1978	NaN	Supplemental	Town
2	0011-3	Castle 2 for 1 Bonus Offer	1987	NaN	Lion Knights	Castle
3	0012-1	Space Mini-Figures	1979	12.0	Supplemental	Space
4	0013-1	Space Mini-Figures	1979	12.0	Supplemental	Space

In [4]:

# Get list of licensed sets
parent_themes = pd.read_csv('parent_themes.csv')
licensed_themes = parent_themes[parent_themes['is_licensed']]['name']
licensed_themes.head()

Out[4]:

7                    Star Wars
12                Harry Potter
16    Pirates of the Caribbean
17               Indiana Jones
18                        Cars
Name: name, dtype: object

In [5]:

# Subset for licensed sets
licensed = lego_sets_clean['parent_theme'].isin(licensed_themes)
licensed_sets = lego_sets_clean[licensed]
licensed_sets.head()

Out[5]:

	set_num	name	year	num_parts	theme_name	parent_theme
44	10018-1	Darth Maul	2001	1868.0	Star Wars	Star Wars
45	10019-1	Rebel Blockade Runner - UCS	2001	NaN	Star Wars Episode 4/5/6	Star Wars
54	10026-1	Naboo Starfighter - UCS	2002	NaN	Star Wars Episode 1	Star Wars
57	10030-1	Imperial Star Destroyer - UCS	2002	3115.0	Star Wars Episode 4/5/6	Star Wars
95	10075-1	Spider-Man Action Pack	2002	25.0	Spider-Man	Super Heroes

In [6]:

# Calculate the percentage of licensed sets that are Star Wars themed
all_sets = len(licensed_sets)
star_wars_sets = len(licensed_sets[licensed_sets['parent_theme'] == 'Star Wars'])
ratio = star_wars_sets / all_sets
the_force = int(ratio * 100)
print(f'The percentage of licensed sets that are Star Wars themed is {the_force}%.')

The percentage of licensed sets that are Star Wars themed is 51%.

In which year was the highest number of Star Wars sets released?¶

In [7]:

# Create a pivot table of sets released by theme per year
licensed_pivot = licensed_sets.pivot_table(index='year', columns='parent_theme', values='set_num', aggfunc='count')

# Find the year when the most Star Wars sets were released
licensed_pivot.sort_values(by="Star Wars", ascending=False)["Star Wars"]
new_era = 2016
print(f'The year when the most Star Wars sets were released was {new_era}.')

The year when the most Star Wars sets were released was 2016.